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Abstract 

We analyse graphs in which each vertex is 
assigned random coordinates in a geometric 
space of arbitrary dimensionality and only 
edges between adjacent points are present. 
The critical connectivity is found numerically 
by examining the size of the largest cluster. 
We derive an analytical expression for the 
cluster coefficient which shows that the graphs 
are distinctly different from standard random 
graphs, even for infinite dimensionality. Insights 
relevant for graph bi-partitioning are included. 
PACS: 05.10.Ln, 64.60.Ak, 89. 75. Da 
KEY WORDS: Networks, percolation, phase 
transitions, random graphs, scaling, graph 
bi-partitioning. 



1 Introduction 

The interest in complex networks has exploded 
over the last five years Q where data on very 
large networks like the WWW g g, |], collab- 
orations in the scientific community f|, trans- 



portation movie actor collaborations ||] etc. 
have become accessible. 

Random graphs are often used to model 
complex networks |J. Ever since Erdos and 
Renyi's groundbraking work more than forty 
years ago [jnj, intense theoretical research on 
random graphs has been taking place |3], 11, 
[12] , |l~3|| . In contrast to random graphs the inter- 
actions between the sites in a lattice are usually 
between nearest neighbours, reflecting a myopic 
world. Lattices are therefore often said to be at 
the other end of the spectrum of network mod- 



els 0, [15 



Properties of real networks like robustness [16, 
0], growth |y], 0, |19|, g(]], and topology have 
attracted much attention, primarily from physi- 
cists. It has been consistently shown that many 
of the networks possess small world character- 
Like random graphs, small 
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istics 

world networks are characterized by short av- 
erage distances between any two sites, and by 
a high degree of localness, much like in lattices. 
However, individually, random graphs and lat- 
tice models in their pure forms are poor models 
of many real world networks. One could argue 
that high-dimensional lattices have the neces- 
sary high clustering and low average path length, 



though this has not been explored much [23|. In 
the current paper we provide results on high- 
dimensional systems. 

A random geometric graph (RGG) is a ran- 
dom graph with a metric. It is constructed by 
assigning each vertex random coordinates in a d- 
dimensional box of volume 1, i.e. each coordinate 
is drawn from a uniform distribution on the unit 
interval. RGGs have been used sporadically in 
real networks modeling [24] and extensively in 
continuum percolation |25|, 26, 27, 28, |2t|, but 



almost exclusively in two and three dimensions. 
Although RGGs are the continuum version of 
lattices, they deserve some attention of their 
own, since percolating continuum systems dis- 
play behaviour that lattices are incapable of |3(]] . 
In addition, the connectivity in RGGs can be in- 
creased in a more natural way than by adding 
new bonds randomly in lattices. 

Recently, continuum percolation has been 
used in the study of the stretched exponential de- 
cay of the correlation function in random walks 
on fractals and the conjectured relation to relax- 



ation in complex systems [31]. However, con- 
tinuous systems in general and RGGs in par- 
ticular are relevant whenever we need a multi- 
dimensional system with a metric, as for exam- 
ple when modeling the spread of diseases . 

In this paper we study RGGs in arbitrary di- 
mensions. In low dimensions the systems are 
dominated by local interactions. For higher di- 
mensions RGGs are usually believed to approach 
standard random graphs, which we show is true 
only in some respects. We focus on 'phase tran- 
sitions' 



m 



j, 34] at the percolation thresh- 
old by looking at the size of the largest cluster, 
and we determine how the value of the critical 
parameter in RGGs approaches that of random 
graphs as the dimension increases. We also ex- 
tract the distribution of cluster sizes in the crit- 
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Figure 1: The size of the largest cluster in random 
graphs as a function of the connectivity. Note that 
for N > 10 6 the Monte Carlo data is almost indis- 
tinguishable from the theoretical result in Eq. (||). 
Error bars are not shown since they are in all cases 
less than the width of the lines. Inset: A closer look 
at the percolation threshold a c = 1. 

ical region. Furthermore, an expression for the 
cluster coefficient, a quantity that has attracted 
much interest in network theory recently, is de- 
rived. Results relevant for graph bi-partitioning 
are established. Finally, we discuss how to im- 
plement random geometric graphs efficiently. 

The layout of this paper is as follows. In Sec- 
tion |2| and || we describe random graphs and ran- 
dom geometric graphs, respectively. In Section |] 
we present our results, and Section |B| contains the 
details regarding the implementation. Finally, in 
Section we sum up. 



2 Random Graphs 

Random graphs consist of N vertices 
(points/sites) and K edges (lines) where 
each possible edge is present with probability p, 
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i.e. K = pN(N — l)/2.[] To keep the discussion 
independent of the system size N, graphs are 
often characterized by the connectivity (degree) 
a = 2K/N = pN, i.e. the average number of 
connections per vertex, instead of K or p. As 
the connectivity increases clusters of vertices 
appear, where a cluster consists of all vertices 
linked together by edges, directly or indirectly. 

The size of the largest cluster in the macro- 
scopic limit N —* oo can be calculated analyti- 
cally (Til |l|. It is NG(a), where 
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we can invert Eq. (|T|), getting 
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(3) 



from which it is trivial to show that a c = 1. 
With Eq. (||) it is an easy task to plot the frac- 
tion of vertices in the largest cluster — the giant 
component — as done in Fig. |l|, where we see the 
prototype of a phase transition in combinatorial 
problems. 

In random graphs the probability distribution 
of edges pk is binomial 
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(4) 



where the approximation resulting in the Pois- 
son distribution is valid for large systems sizes 

1 From here on we consider N ~ TV" — 1 in accordance 
with the literature ^, |35) , since we are only investigating 
large systems. 
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Figure 2: A 2D random geometric graph with N = 
500 and a — 5. The graph is bi-partitioned — see Sec- 
tion |. There are no edges across the boundaries, i.e. 
the boundary conditions are open, not continuous. 



N, which is exactly the limit in which we are in- 
terested. The critical connectivity a c for graphs 
with arbitrary random degree distribution pk has 
recently been derived by other techniques than 
those orig inally leading to Eq. (|) f§||,|37|]. Un- 
fortunately, we cannot use these results in con- 
nection with random geometric graphs, as will 
become clear in the next section. 



3 Random Geometric Graphs 

A (i-dimensional random geometric graph 
(RGG) is a graph where each of the iV vertices is 
assigned random coordinates in the box [0, l] d , 
and only points 'close' to each other are con- 
nected by an edge. The degree distribution of 
a RGG with average connectivity a is therefore 
given by Eq. (||) as well. However, a RGG is 
a special kind of random graph with properties 
not captured by the theoretical tools mentioned 
above. For one thing, the probability that three 
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Figure 3: The critical distance in random geometric 
graphs in various dimensions. Points within this dis- 
tance of each other are connected by an edge. The 
critical distance is equivalent to the radius R of the 
excluded volume associated with each point. 

vertices are cyclically connected is different in 
random graphs and RGGs, regardless of the de- 
gree distribution of the random graph. 

RGGs are sometimes named spatial graphs ||. 
Fig. H illustrates a RGG in 2D. As in lattices, 
different boundary conditions can be applied. 
We will see that toroidal (continuous) boundary 
conditions make a vital difference compared to 
having open boundary conditions. 

The volume of a d-dimensional (hyper)sphere 
with radius r is 
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(5) 



between their centers is < 2r, i.e. if the spheres 
overlap. Since the total volume of our box is 1, 
the probability that two arbitrarily chosen ver- 
tices are connected is equal to the volume of a 
sphere with radius R = 2r. In continuum perco- 
lation theory this volume is denoted the excluded 



volume V PT , where V P . 



2 d V in a RGG. The 



excluded volume is the basic quantity of interest 
because it is directly related to the connectivity 



a = Np = NV e: 



(6) 



from which it is clear why the connectivity is 
frequently called the total excluded volume of 
the system. Eqs. (|5|) and ([]) give us 
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where T(x) is the gamma function. This volume 
is needed in order to find the edges in RGGs. 

To 'visualise' a RGG in general, one can think 
of a box filled with small spheres with radius r 
and volume V given by Eq. (g), where points 
are connected by an edge only if the distance accomplished [38] 



Fig. ^ shows the radius R of the excluded vol- 
ume as a function of N/a = 1/p = l/V ex . R 
decreases monotonically: for a given connecti- 
vity a the spheres have to become smaller when 
more vertices are added to the graph. 

Eq. (^) provides us with the required relation 
between a and R when creating a RGG. The 
distance between every pair of vertices must be 
calculated, and an edge is added if the distance 
is less than R. Thus, it seems unavoidable to 
have a runtime of 0(N 2 ) making it unfeasible 
to investigate as large systems as with random 
graphs — see Fig. [I] — where the number of cal- 
culations for a given a needed to create all the 
edges is O(N). To overcome this obstacle we 
have designed a data structure which is described 
in Section ||, with a runtime of 0(N@) where 
(3 ~ 1.3. This allows us to study RGGs with 
up to N = 4 11 > 4 • 10 6 vertices, which is more 
than an order of magnitude larger than usually 
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4 Results 

In our simulations of RGGs we define a c to be 
the lowest connectivity at which the fraction of 
vertices in the largest cluster is > in the macro- 
scopic limit. We make the bold claim that the 
systems we are able to analyse consist of enough 
points to make the critical connectivity almost as 
sharply defined as in Fig. [l]. However, our main 
purpose is not to derive high precision percola- 
tion thresholds. Instead, we are more interested 
in the critical connectivity as a function of the 
dimension of the RGGs. 

In this paper we express our threshold values 
in terms of a. Other popular choices are the frac- 
tional volume s occupied by the spheres |Q or 
the density N of spheres. The relation between 
these parameters at the percolation threshold is 



a,. 



N c V ex 



-2 d \n(l-s c ) 



(8) 



(see e.g. [£5| for a derivation). Usually, in con- 
tinuum percolation the volume V of each sphere 
is fixed while N is the independent variable in a 
system of size [0, L] d . The approach of measur- 
ing N c or s c for various values of L has been used 
in both two |3{J and three |3^] dimensions, i.e. 
for discs and spheres, where the critical values 
are determined by the use of finite size scaling. 
This procedure resembles site percolation in lat- 
tices. From the previous sections it is clear that 
we take a route closer to bond percolation in lat- 
tices by fixing L = 1 while tuning a for different 
values of N. In Section || we describe how this 
has been carried out in practice. 

The Size of the Largest Cluster 

Let Gd(a) denote the fraction of vertices in the 
largest cluster in d dimensions. Since a RGG 
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Figure 4: The average fraction of vertices in the 
largest cluster for various system sizes N (see the 
legend in Fig. ^ in random geometric graphs with 
no edges across the boundaries. The inset in 2D il- 
lustrates a finite size scaling — see the text. In higher 
dimensions the general shape of the curves as N in- 
creases is nontrivial. Compare with Fig. ^. Error 
bars are < 10 -3 for all curves and therefore omitted. 



in the limit of infinite dimension is often as- 
sumed equivalent to a random graph, we expect 
that Eq. (||) provides us with an expression for 
G oo (at). But what does Gd(ce) look like for finite 
d? And what is the behaviour of a c (d)7 How 
does it approach a c (oo) as d increases? These 
are the questions addressed in this and the fol- 
lowing section. 

Figs. H and |H| illustrate the average size of the 
largest cluster in RGGs in 2, 3, 4, and 5 dimen- 
sions with and without toroidal boundary condi- 
tions. The curves correspond to N = 4 fc vertices 
with k = 5,6, ...,11, where the larger systems 
display the sharpest transitions. The legend in 
Fig. |5] applies to all diagrams in Figs. |I| and |5[ In 
these 8 diagrams each curve is based on 300 data 
points. In other words, Gd(a) is calculated in 
intervals of Aa = 0.005 resulting in the smooth 
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Figure 5: Like Fig. [|but with continuous boundary 
conditions. We see that the point at which the largest 
cluster becomes macroscopic is sharply defined and 
can immediately be determined by the eye with high 
precision (Table [l]). The overall behaviour of the 
graphs for higher dimensions is much closer to Fig. |l] 
than Fig. ^ is. As d increases the a-interval where 
there is a significant difference between curves with 
different N get smaller and smaller. Error bars are 
< 10~ 3 for all curves and therefore omitted. 



lines in the figures. For every data set we have 
averaged over enough runs for error bars to be 
completely negligible. 

Since continuous boundary conditions mean 
addition of extra edges, the size of the largest 
component G{a) obviously grows faster in Fig. |5| 
than in Fig. f|, especially in the smaller systems. 
These relatively few extra edges make a decisive 
difference, connecting vertices not already in the 
same cluster. Since toroidal systems are mod- 
els of bulk systems, G is much less iV-dependent 
in that case. However 'unphysical' RGGs with 
open boundaries may seem, they are the most 
popular RGG version in the literature. Conse- 
quently, we consider them alongside the contin- 
uous case. 
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Figure 6: Scaling of the critical connectivity as a 
function of the dimension of the random geometric 
graphs reveals a power-law relation, Eq. (^) . For d < 
5 the data points are estimated by close inspection of 
Fig. [|. For d > 5, a c is based on runs with N = 4 10 
points. Error bars are included. See Table |. 

From Figs. || and |5| we see that the contin- 
uous boundary conditions make the transition 
where G > more abrupt, but that an estima- 
tion of a c does not depend much on the bound- 
ary conditions if only we base our judgment on 
large enough systems. This is confirmed in the 
inset of Fig. ||, where a c = 4.53 is obtained 
by finite size scaling, i.e. plotting G(x) where 
x = N 1 ^ v (a — a c ). However, it is clearly easier 
to make precise estimates of the critical connec- 
tivity with than without continuous boundary 
conditions. We note in passing that the expo- 
nent v = 3 is equal to the value of v found in 



random graphs [13|. 



The Critical Connectivity 

With numerically obtained knowledge of G(a), 
it is possible to extract a c . The procedure is 
simple. By inspection of Fig. || we can estimate 
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d 


2 


3 


4 


5 


6 


7 


8 


a c 


4.52 


2.74 


2.06 


1.72 


1.51 


1.39 


1.30 


± 


0.01 


0.01 


0.02 


0.02 


0.02 


0.02 


0.02 



Table 1: The critical connectivity a c in random geo- 
metric graphs of dimension d with continuous bound- 
ary conditions. The data are plotted in Fig. [g. The 
estimated errors in a c in the last row are rather con- 
servative. 

a c for d < 5. To obtain further data points we 
have run our algorithm on RGGs with N = 4 10 
for systems of larger dimensions as well. Though 
this results in increased runtime per graph, the 
results get more homogeneous and fewer runs 
are needed in order to get a decent estimate of 
Gd(a). Our findings presented in Table || and 
Fig. |6| strongly suggest that 
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where a c (oo) = 1,7 = 1.74(2) and A = 11.78(5). 
As expected, Eq. (g) predicts that a c (oo) is equal 
to a c in random graphs, confirming that RGGs 
and random graphs become more and more sim- 
ilar as d increases. However, when we derive the 
cluster coefficient, we will see that this is not 
true in all respects. 

Finally, we note that our findings are in ac- 
cordance with the most precise estimates that 
we know of: a c = 4.51223(5) g£§ and a c = 
2.734(6) @ in 2D and 3D, respectively, ob- 
tained by the use of finite size scaling. For d > 3 
we have not been able to find any estimates of 
a c to compare with [fh]]. 

The Distribution of Cluster Sizes 

Having examined the size of the largest cluster 
and the critical connectivity, we now look at the 
distribution of cluster sizes in RGGs. 
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Figure 7: The distribution of cluster sizes in 3D 
random geometric graphs with N = 1000 vertices in 
the vicinity of the critical connectivity a c — 2.74. 
The inset shows that for a ~ a c the cluster sizes are 
given by a power-law. For each value of a the data 
points are based on 10 6 graphs. 

The inset illustrates the scale free power-law 
distribution at a = 2.6. Right below a c , clusters 
of all sizes can be encountered. The small hump 
at large cluster sizes is always present because 
the clusters cannot contain more than all of the 
vertices. The clusters pile up when their size 
approaches this boundary, in this case a cluster 
size of 1000, just below the inevitable cut-off. 

Our simulations show that for a significantly 
below q c the distribution is approximately ex- 
ponential. As the connectivity increases the dis- 
tribution becomes power-law-like. As a is fur- 
ther increased the distribution is separated in 
two parts; there are no clusters of medium size, 
only the largest macroscopic cluster and a few 
small ones around it. We have observed this 
overall behaviour in all our tests of the distri- 
bution of cluster sizes in various dimensions. 

Fig. [7] shows our data in 3D. For a = 2.1 
(•) the data points lie on an almost straight line 
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indicating an exponential distribution. Increas- 
ing the connectivity to a = 2.4 (A) results in a 
broader distribution that is no longer exponen- 
tial. Right at the critical connectivity (o) the 
distribution flattens out. Clusters of all sizes are 
observed. Right above a c (o) two separate re- 
gions begin to materialise. Already at a = 3.3 
(*) the largest cluster makes it highly unlikely 
that a cluster of medium size can be present as 
well. The distribution is cut in two. 

The Cluster Coefficient 

In network theory the cluster coefficient C is 
an often calculated quantity ^l], 23 1, which 
is defined in the following way. Let the vertices i 
and j be connected directly to a common vertex 
k. C is then the probability that vertex i and 
vertex j are directly connected as well. From 
this we see that the cluster coefficient is a mea- 
sure of the 'cliquishness' of the graph. In this 
section we derive C = analytically in arbi- 
trary dimensions d, showing that decreases 
in an exponential fashion. 

To determine C d we make use of the concept of 
the excluded volume V ex . If we again use the ver- 
tices i, j, and k, then i and j must both be within 
the excluded volume of k. Put differently, the 
probability that i and j are connected is equal to 
the probability that two randomly chosen points 
in a sphere of volume V ex and radius R is less 
than a distance R apart. In other words, given 
the coordinates of vertex i the probability that 
there is an edge between i and j is equal to the 
fraction of the excluded volume of vertex i that 
lies inside the excluded volume of k. By aver- 
aging over all points in V ex we get the cluster 
coefficient C d - 

The task of calculating C d is considerably 
simplified by the spherical symmetry of the prob- 
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Figure 8: The cluster coefficient C in random geo- 
metric graphs. The full line is the asymptotic solu- 
tion, Eq. (|l3|), valid for large d only. 

lem. The fractional volume 'overlap' of two 
spheres only depends on the distance r between 
the centers and not on any angular parts, i.e. 
Pd = Pd( r )- I n general, the cluster coefficient 
can therefore be written as 



1 



Cd = TT \ P d ^ dV - 

Vex JV ex 

In Appendix |A| we derive that 
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When d is large Eq. ( |iT| ) reduces to (see Ap- 
pendix ^) 
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The cluster coefficient is plotted in Fig. || (o) 
together with the asymptotic solution in Eq. ( |l3| ) 
(full line). 

Eq. (11) shows that the cluster coefficient is 
a purely geometric quantity depending only on 
the dimension d; neither the connectivity a nor 
the system size N are present. In random graphs 
C = a/N, since there is per definition no corre- 
lation between edges. So, in contrast to what 
is usually believed, RGGs are not identical to 
random graphs when d —* oo. 

In higher dimensions, the cluster coefficient in 
RGGs becomes exceedingly small. This peculiar 
fact can be explained by noting that the distribu- 
tion of distances between two connected vertices 
gets more and more peaked at the maximal dis- 
tance R as d increases. This implies that if the 
vertices i and j are both connected to vertex k 
in a high-dimensional space, then it is highly un- 
likely that i and j are directly connected by an 
edge as well. Only in low dimensions are RGGs 
dominated by small loops. On the contrary, the 
way that a standard random graph is designed 
implies a cluster coefficient which can only be 
interpreted statistically, and not geometrically. 
Despite the fact that a c = 1 in both random 
graphs and RGGs of infinite dimensionality, they 
do not have the same topology. 

Graph Bi-partitioning 

Random geometric graphs are useful outside net- 
work modeling and percolation theory as well. 
In this section we look at RGGs in relation to 
graph bi-partitioning, a well known problem in 
combinatorial optimization. 

The NP-hard problem of partitioning a graph 
with N vertices in two subsets with N/2 ver- 
tices each, in such a way that the cutsize E, i.e. 
the number of edges between vertices in different 
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4.52 


2.84 


2.275 


1.99 
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0.02 


0.01 


0.005 


0.005 



Table 2: The critical connectivity a° BP in random 
geometric graphs with toroidal boundary conditions. 
Only in ID does a^ BP depend noticably on N for 
N > 1000 (see Fig. |h. Note that without contin- 
uous boundaries Fig. ^| shows that a^ BP is highly 
size-dependent for d > 2. The estimated errors in 
a ( i BP in the last row are on the safe side. 



subsets, is minimized, is called the graph bi- 
partitioning (GBP) problem. Fig. |2| illustrates 
a bi-partitioned RGG, where N ~/2 of the points 
are marked by squares, the other half being dots. 

The GBP problem of RGGs with open bound- 
ary conditions has been tested by various heuris- 
tics pi] , ^2, 43]. In this section we use our numer- 
ical findings to establish the critical connectivity 
in relation to GBP. Additionally, for a > ol% bp 
we argue that the cutsize E depends on N and 
a in a simple way. 

In GBP the connectivity is critical when G = 
1/2. As soon as the largest cluster contains more 
than half of the vertices, it becomes impossible 
to bipartition the graph without violating any 
edges. For random graphs Eq. (||) immediately 
gives us a1 BP = 2 In 2 ~ 1.386. 

In RGGs a^ BP (d) can be extracted in the same 
way as a c was in Section ||[ Our numerical 
findings in RGGs with continuous boundary con- 
ditions are presented in Table ^. We stress that 
the results are valid only for large iV, as a closer 
look at Fig. [| reveals. In 2D the average fraction 
of vertices in the largest cluster is independent 
of N only for a > a^ BP . This means that if one 
looks at GBP in 2D with N = 1000, one cannot 
use the value of BP in Table ||. In higher di- 
mensions the interval around a c where Gd{a) is 
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size-dependent gets smaller and does not play a 
role in relation to GBP. 

With open boundary conditions the picture 
is messy, as Fig. |] shows. In this case G(a) 
is highly A-dependent, and it is not possible 
to speak of a critical connectivity a^ BP without 
specifying N. This is true despite the fact that 
G(a) is an averaged quantity, i.e. for small N will 
a fraction of the graphs contain a cluster with 
more than N/2 vertices even when a < cx^ BP . 
Fig. [| clearly shows that a^ BP is a decreasing 
function of iV for d > 2. In 2D however, all 
curves cross at almost the same (pivotal) point, 
and it is reasonable to speak of a^ BP without 
specifying N. As the inset in Fig. |^ shows this 
would lead to an estimate of a^ BP = 4.53(1), 
close to a^ BP in RGGs with toroidal boundary 
conditions. 

The size of the largest cluster near a c grows 
so rapidly in 2D that a c = a^ BP cannot be ruled 
out on the basis of our numerical data. This is 
true with both open and continuous boundary 
conditions. However, as this would imply that 
the phase transition is of 1st order in 2D only, 
we believe that the two critical connectivities are 
close but not identical. 

When bi-partitioning a RGG, it is obvious 



that the 'area of contact' [44] between the two 
subsets in the optimal configuration must be 
close to a minimum. In 2D this means that the 
best achievable partition must be close to sim- 
ply cutting the graph in two at the coordinate 
values x\ = 1/2 or X2 = 1/2. This observa- 
tion is especially relevant for large connectivi- 
ties where the cutsize is, fluctuations neglected, 
proportional to the length of the dividing line. 
All this tentatively indicates how the cutsize E 
in GBP behaves function of N and a by 
looking at RGGs partitioned at Xi = 1/2, where 
1 < i < d. As we are about to argue, we expect 



a scaling relation like |46j] 

E d onN x f v cP[d), 



(14) 



where the exponents v and (5 only depend on the 
dimension of the RGG. 

The exponents in Eq. (|l4l) can be determined 
in the following way. Given the radius R of the 
excluded volume of each vertex, the cutsize must 
be proportional to NR, since only vertices with 
1/2 — R < Xi < 1/2 contribute to the cutsize (to 
avoid counting the violated edges twice we only 
look at the vertices at one side of the partitioning 
plane at Xi = 1/2), times the average number of 
violated edges per vertex in this region, which is 
proportional to NR d . In other words, 



E d oc N Z R 



2 nd+1 



(15) 



If instead of R we want to express the result in 
terms of a(d) oc NR d , we get 



1 

1: 



13 = 1 + 



(16) 



Since E oc N 2 in Eq. (|i~5|), the relation l/v+f} = 
2 holds in arbitrary dimensions. 

Now, it is obvious that the scaling Ansatz is 
reasonable only for a > a^ BP . As Fig. || il- 
lustrates, the optimal partition at a ~ a° BP is 
highly complex and not at all close to a straight 
line. If we incorporate that E = for a < ot^ BP 
and replace Eq. ( |14| ) with 



E d oc ^^(aid) 



a 



GBP\j3 



(17) 



we do not expect Eq. (|16|) to hold if we focus 
only on a region near the critical connectivity. 
By the use of extremal optimization, a heuristic 
that works particularly well near phase transi- 
tions in hard combinatorial problems, Boettcher 
and Percus jig, have found a^ BP ~ 4.1, 
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l/v ~ 0.6 and (3 ~ 1.4 in 2D for 4 < a < 6, 



not far off our estimates in Eq. (16) valid for 
large connectivities. Note that the low estimate 
of a^ BP is expected; the algorithm does not al- 
ways find the best partition, and some graphs 
with a < a c does have E > 0. 

5 Implementation 

The implementation is of major importance 
when studying random geometric graphs, since 
a straightforward check of all possible edges be- 
tween the N points will result in unfeasible run- 
times 0(N 2 ). We now outline how our pro- 
gram works and describe how to avoid runtimes 
0(N 2 ). 

The main idea is to divide and conquer. Par- 
tition the d-dimensional box in smaller subboxes 
and determine which subbox each vertex belongs 
to. Given the connectivity and thereby the ra- 
dius R of the excluded volume, for each vertex 
we then only have to look for potential edges to 
vertices in the subboxes adjacent to the subbox 
where the vertex itself is located. This leads to 
a huge reduction in the number of comparisons. 
And this just gets better when N increases, re- 
sulting in a decrease in R as we saw in Fig. |3| 
By partitioning the box further as N increases 
we avoid a linear increase in the number of com- 
parisons per vertex, which would lead to the un- 
desirable 0(N 2 ) growth. 

The algorithm used when looking at RGGs is 
simple. It works like this: 

1. Generate d coordinates for each vertex. 

2. Partition the space in small subboxes. 

3. Find the edges. 
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Figure 9: The runtimes (on a 400MHz SUN) of the 
algorithms used in Sections ^ and ^. The straight 
lines indicate t ~ N 13 , where = 1.2 in 2D, (3 = 1.33 
in 5D and (3 = 1.15 in random graphs (RG). 

4. Calculate the relevant quantities (G, cluster 
sizes etc.) as a increases. 

Obviously, a trade-off in Step 2 is involved when 
choosing the number of small boxes. 

Being the most time consuming part of the 
algorithm, Step 3 is the main contributor when 
deciding how the runtime depends on N. The 
runtimes for most of our runs are shown in 
Fig. |. We see that the runtime is 0(N^), 
where [3 ~ 1.3, resulting in 'feasible' runtimes 
for graphs with up to N > 4 • 10 6 . Note that 
the runtime of the much simpler algorithm used 
on random graphs also grows like a power-law 
with (3 = 1.15, even though the number of op- 
erations is clearly O(N). In fact, the number of 
comparisons with potential neighbours per ver- 
tex is very nearly constant in our implementa- 
tion, i.e. the total number of neighbour tests is 
0(N) in RGGs as well. Of course, this is only 
possible if the number of subboxes also increases 
with N. Managing the partitioning part of the 
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algorithm adds to the runtime. To sum up, the 
power-law increase in the runtime illustrated in 
Fig. |9]for both random graphs and RGGs is prob- 
ably mainly due to cache misses. The slightly 
higher values of (3 in the RGGs stems from the 
additional time used when partitioning the d- 
dimensional box into smaller boxes. 

Step 4 is worth a comment. When running 
the algorithm we are interested in information at 
certain values of a. Instead of generating a new 
graph for every data point needed, we first set 
up the graph with the minimal connectivity we 
want to look at. This is easily accomplished with 
our algorithm. Given an Ot- window \otrnirn ®-max\ 
in which we want to examine the graph, we find 
all the edges belonging to the graph when a = 
o-max-, but we only add the edges corresponding 
to 

a — ot-min- The rest of the edges, those who 
are to be added when a is gradually increased to 
Oimax-, are stored in a priority queue. It is then 
a simple task to increase a as one wishes. As 
mentioned earlier, in Figs. [| and [| each curve is 
based upon 300 data points, i.e. Aa = 0.005. 

The source code, written in C, is available 
upon request. For a more accurate and tech- 
nical discussion of fast algorithms in relation to 
RGGs, see e.g. 



6 Summary 

In this paper we have illustrated the usefulness 
of random geometric graphs in network theory 
and how to implement them efficiently. Sev- 
eral properties of random geometric graphs in 
the vicinity of the critical connectivity a c have 
been analysed. We have determined the size of 
the largest cluster numerically and shown that 
a c (d) approaches a c (oo) = 1 found in random 
graphs in a power-law fashion. We have verified 



that the distribution of cluster sizes is cut in two 
just when the connectivity becomes larger than 
a c . Interestingly, the derivation of the cluster 
coefficient shows that, even in the limit of infinite 
dimensionality d, random geometric graphs are 
not identical to random graphs. 

Random geometric graphs share properties 
with both lattice models and standard random 
graphs. Random geometric graphs allow us to 
work with random graphs with a local structure. 
In addition, it is straightforward to add 'long' 
edges if one wishes to simulate, e.g., a small 
world network. With all this in mind, we hope 
this paper will make random geometric graphs 
more widely used in network theory. 



A Derivation of Cd 

In order to determine the cluster coefficient for 
arbitrary d, one must find the fractional overlap 
Pd- Since pd has no angular dependence, Eq. (|i"o|) 
reduces to 



C d 



d_ 
B7 



R 







p d (r)r d 1 dr. 



(18) 



Since Pl = 1 - d = f . From Fig. H 
we see that in 2D the overlapping area — the 
area circumscribed by the fat lines — is 2(A — B), 
where A is the area of the part of the circle 
swept out by the angle 6 = 2 arccos(r/2i?) be- 
tween the two dashed lines originating from the 
center of the lowest circle, and B is the area of 
the dashed triangle. Now, A = \9B? and B = 
R 2 cos(8/2) sin(0/2) = \R 2 sm6. The area of the 
overlap is then R 2 (6 — sin 8), so p2 = —{6 — sin0) 

andC 2 = l-^. 

For d > 3, the use of cylindrical coordinates 
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Figure 10: Determination of the cluster coefficient 
C, which in 2D is equal to the average fractional area 
overlap of the two circles. R is the radius of the circles 
and r the distance between their centers. The area of 
the overlap is confined within the fat arcs originating 
from the two circles (dotted). The dashed lines are 
helpful in the derivation of the overlap — see the text. 



By putting x = cos 9 — 1/2, the cluster coefficient 
can therefore be written as 



C d 




cxp 



d-1 



lnf(x) 



dx, (22) 



where fix) = 1 — -y(l + x). Since the contri- 
butions to the integral for large d are significant 
only when x ~ 0, In/ can be expanded to 1st 
order and Eq. (|i"3|) is recovered. 
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and the relation 

2^11 / ^ n ~ l W=— ^ (19) 

i=2 J0 L \ 2 ' 

results in 

O r(d+2\ / , a rccos(^ ? ) 

By reversing the integration in Cd we get 

Cd = -4=-W / 3 sin d 9d9, (21) 

0Fr(^±i)7o v ; 

which can be solved by integration by parts. The 
use of the duplicate formula for the Gamma func- 
tion then finally leads to Eq. (|Tl|). 

For large d, the ratio of the Gamma functions 
in Eq. (^Tj) is given by Stirling's approximation. 
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